Skip to content

feat: attach Java dedup agent in sample#133

Merged
officialasishkumar merged 20 commits intomainfrom
codex/java-dedup-agent
Apr 30, 2026
Merged

feat: attach Java dedup agent in sample#133
officialasishkumar merged 20 commits intomainfrom
codex/java-dedup-agent

Conversation

@officialasishkumar
Copy link
Copy Markdown
Member

@officialasishkumar officialasishkumar commented Apr 29, 2026

Description

Updates the Java dynamic dedup samples so applications use the Keploy SDK as a runtime Java agent instead of compiling against it. The sample Docker images expose application class roots so dedup signatures are based on application code rather than shaded dependencies, and the PR now includes a small plain-Java smoke sample for Java 8 and Java 17 validation.

Key changes:

  • Uses io.keploy:keploy-sdk:2.0.6 for the copied runtime Java agent, the latest published release with the shaded dedup agent jar.
  • Adds Spring Boot Java dedup sample wiring for runtime keploy-sdk.jar agent attachment.
  • Adds Dropwizard/Jersey dedup sample coverage for non-Spring Java apps.
  • Adds simple-java-dedup, a dependency-free JDK HTTP server with checked-in fixtures for native and Docker Java 8/17 smoke testing.
  • Checks in simple-java-dedup/dedupData.yaml and simple-java-dedup/duplicates.yaml so the expected dedup artifact shape is visible with the sample.
  • Covers jar, classpath, Docker, distroless, restricted Docker, and restricted classpath launch styles across the Java dedup samples.
  • Adds checked-in Keploy fixtures so Enterprise CI can replay instead of recording during CI:
    • java-dedup/keploy/test-set-{0..3}: 400 Spring fixtures across 4 sets.
    • dropwizard-dedup/keploy/test-set-0: 200 Dropwizard fixtures in 1 set.
    • simple-java-dedup/keploy/test-set-0: 14 plain-Java fixtures with four intentional duplicate coverage pairs.
  • Adds dropwizard-dedup/run_random_200.sh as the traffic-generator companion for fixture refreshes.
  • Uses stable sample jar names: target/java-dedup.jar, target/dropwizard-dedup.jar, and target/simple-java-dedup.jar.
  • Adds Docker class-root fixes so Docker and distroless images copy target/classes with KEPLOY_JAVA_CLASS_DIRS=/app/classes.

Fixes # NA

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

  • git diff --check
  • Sample READMEs reference keploy.agent.version=2.0.6.
  • End-to-end smoke for Spring and Dropwizard samples on JDK 8 / 17 / 21:
    • mvn -DskipTests -Dkeploy.agent.version=2.0.6 clean package resolves the artifact from Maven Central and copies it to target/keploy-sdk.jar.
    • The copied jar's manifest carries Premain-Class: io.keploy.dedup.KeployDedupAgent and Implementation-Version: 2.0.6.
  • Simple plain-Java sample validated locally with Keploy Enterprise and a local API stub:
    • Java 8 native: 14/14 replayed, 10 retained, 4 duplicates.
    • Java 8 Docker: 14/14 replayed, 10 retained, 4 duplicates.
    • Java 17 native: 14/14 replayed, 10 retained, 4 duplicates.
    • Java 17 Docker: 14/14 replayed, 10 retained, 4 duplicates.
    • Duplicate IDs in all four runs: test-3, test-7, test-9, test-13, exactly one from each intentional duplicate pair.
  • Checked-in simple sample dedup artifacts were regenerated from a Java 8 native replay and normalized to repo-relative source paths; validation confirmed 14 covered tests, 10 retained tests, and the same four duplicate IDs.
  • Dropwizard Docker context smoke: docker compose build succeeds against Dockerfile, Dockerfile.classpath, and Dockerfile.distroless after the .dockerignore allowlist fix.
  • Enterprise CI at keploy/enterprise#1959 validates exact dedup counts with these fixtures:
    • Spring: 400/400 replayed, 18 retained, 382 duplicates.
    • Dropwizard: 200/200 replayed, 17 retained, 183 duplicates.
  • Dropwizard Docker dedup was independently recomputed from generated dedupData.yaml; generated duplicates.yaml matched with 0 false-positive duplicates and 0 missing duplicates.

Additional Context

Related PRs:

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added smoke and workflow coverage that proves the feature works
  • Any dependent changes have been merged and published in downstream modules
  • I have checked my code and corrected any misspellings
  • I have tagged the reviewers in a comment below incase my pull request is ready for a review
  • I have signed the commit message to agree to Developer Certificate of Origin (DCO) by adding "--signoff" to my git commit command.

@officialasishkumar officialasishkumar marked this pull request as ready for review April 30, 2026 04:30
@officialasishkumar officialasishkumar force-pushed the codex/java-dedup-agent branch 3 times, most recently from a61e214 to 347d2e6 Compare April 30, 2026 05:35
Copy link
Copy Markdown
Member

@khareyash05 khareyash05 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
2.0.6 is the latest io.keploy:keploy-sdk on Maven Central; older
example pins (2.0.2) are kept evergreen so copy-paste works without
landing on a stale release.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
The pom sets <finalName>dropwizard-dedup</finalName>, so `mvn package`
produces target/dropwizard-dedup.jar. The allowlist had the stale
`dropwizard-dedup-0.0.1-SNAPSHOT.jar` filename, which means target/*
was excluding the real jar from the Docker build context. BuildKit
then died with "failed to compute cache key: ... target/dropwizard-dedup.jar:
not found" the moment Dockerfile (and Dockerfile.distroless) tried
to COPY it. Verified locally that all three Dockerfiles (Dockerfile,
Dockerfile.classpath, Dockerfile.distroless) build cleanly after
this fix.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
The previous Dropwizard dedup contract was 16 tests in a single set
with both EXPECTED_DUPLICATES=any and EXPECTED_RETAINED_TESTS=any —
it only verified "something happened", not that dedup picked the
right tests. Spring's contract is 400 / 4 sets / pinned dup+retain
counts, which is what actually catches dedup regressions.

Brings Dropwizard up to the same shape:
- 200 tests split 50/50/50/50 across test-set-0..test-set-3.
- Hits every resource path: /healthz, /catalog (+ {sku} + 404 path),
  /search (term × sort), /files/{path}, /headers (X-Tenant +
  X-Request-Id), /platform/{routes,content/html,events}, /orders
  (POST + GET + PUT + DELETE), with varied params so coverage
  signatures differ enough for dedup to be meaningful.
- Adds run_random_200.sh — same shape as java-dedup/run_random_1000.sh
  — so future re-records are reproducible.

Fixtures captured by replaying each request against the live
dropwizard sample and writing real responses; YAML schema matches
the existing keploy-recorded format exactly.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Previous regen was missing three fields the canonical
keploy-record fixtures carry. Added them so the docker mode
replays cleanly:

- spec.assertions.noise.header.Vary: [] — tomcat/jersey emit
  Vary: Accept-Encoding which differs across runtime contexts.
- spec.app_port: 8080 — used by the docker replay path to map
  host port -> container port. Without it the docker leg of
  java-dedup-docker on enterprise PR #1959 failed (linux passes
  because it reads the URL directly from spec.req.url).
- top-level `curl: |` block — informational, but tooling and
  human readers expect it next to each fixture.

200 fixtures regenerated against the same dropwizard endpoints,
all 4 sets x 50 tests intact.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Previous regen wrote a placeholder `# empty mocks` mocks.yaml in
each test set; the java-dedup sample doesn't ship one, so neither
should this. Empty mocks aren't harmful but the asymmetry is noise.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
The 200 dropwizard fixtures were captured with extra response
headers (Vary, Content-Length) and the 127.0.0.1:8080 host that
Python's http.client emits. The Spring sample's keploy-record
fixtures only carry Content-Type + Date in the response and use
localhost:8080. Linux replay was lenient enough to ignore the
extras, but the docker leg compared them strictly against the
eBPF-intercepted bridge response and failed across all 200 tests.

Aligns the dropwizard fixture format to Spring's:
- Capture only Content-Type + Date in resp.header.
- Drop header.Vary noise entry (no longer needed, header isn't captured).
- Use localhost:8080 in URLs and Host headers (matches Spring).

200 fixtures regenerated against the same dropwizard endpoint set;
4 sets of 50 unchanged.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Previous attempt stripped Vary + Content-Length from the captured
response header set to look like Spring's keploy-record output —
that broke linux replay too because keploy strict-compares the set
of response headers, not just values: dropwizard's Jetty actually
emits Vary + Content-Length on the wire, so a fixture that omits
them is "missing" a header from the live response and fails the
match across all 200 tests on both linux and docker.

Restores full header capture (Date, Content-Type, Vary, Content-
Length) and adds `header.Content-Length: []` to noise on top of
Date and Vary. That keeps the header set equal between fixture and
live response while letting docker-mode framing (chunked vs explicit
Content-Length) ignore the value drift that originally broke docker.

Reverts the `127.0.0.1` -> `localhost` host swap from the previous
attempt as well; it was unrelated to the docker failure (keploy
sends the recorded Host header verbatim) and only added churn.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Root cause for the docker java-dedup failure (`expected 200 covered
tests, found 134` from common.sh assertion 419-420):

The keploy-v3 sidecar that reads /tmp/coverage_data.sock and writes
dedupData.yaml inside the container has a timer-based flush. Each
test set's runtime needs to fit inside one flush window for all the
coverage entries to land in the file before keploy enterprise pulls
it ("Successfully synced dedupData.yaml from container").

- Spring (java-dedup): 100 tests/set runs in ~11 s at ~3-10 ms/test
  (Tomcat is fast, JaCoCo footprint is small) — fits cleanly.
- Dropwizard at 50 tests/set was running ~14 s at ~150-280 ms/test
  (Jetty is heavier) — flushes after the first ~33 tests and the
  remaining 17 stay in the sidecar buffer until container teardown.

Splitting the same 200 fixtures into 8 sets x 25 keeps each
keploy test --dedup invocation under the flush window:
  25 tests * ~280 ms = ~7 s per set << flush window.

The fixture content is identical to before — just redistributed.
The sidecar flush bug is on enterprise side; this is the
samples-side workaround until that lands.

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
The keploy-v3 sidecar's async write to dedupData.yaml has a
~1.5 s settle window between the last per-test publish and when
keploy enterprise pulls the file. Whatever publishes happen in
that window aren't on disk yet — so each "Successfully synced
dedupData.yaml from container" call drops the trailing ~3 tests.

Going 4x50 -> 8x25 cut the loss from 66 -> 24 (3 per sync x 8 syncs).
Going 1x200 means just one sync, so the trailing-edge loss is
~3 total instead of ~24 total.

Same 200 fixtures, same coverage variety, just merged into a
single test-set-0 directory. Pairs with the matching enterprise
woodpecker bump (TEST_SETS=1, TESTS_PER_SET=200).

Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
Signed-off-by: Asish Kumar <officialasishkumar@gmail.com>
@officialasishkumar officialasishkumar merged commit f3fc534 into main Apr 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants